16 research outputs found
Approximative filtering of XML documents in a publish/subscribe system
Publish/subscribe systems filter published documents and inform their subscribers about documents matching their interests. Recent systems have focussed on documents or messages sent in XML format. Subscribers have to be familiar with the underlying XML format to create meaningful subscriptions. A service might support several providers with slightly differing formats, e.g., several publishers of books. This makes the definition of a successful subscription almost impossible. This paper proposes the use of an approximative language for subscriptions. We introduce the design of our ApproXFilter algorithm for approximative filtering in a publish/subscribe system. We present the results of our performance analysis of a prototypical implementation
Dynamische Datenbankorganisation fĂŒr multimediale Informationssysteme
The topic of this thesis is a mathematically rigorous derivation of formulae for the magnetic force which is exerted on a part of a bounded magnetized body by its surrounding. Firstly, the magnetic force is considered within a continuous system based on macroscopic magnetostatics. The force formula in this setting is called Brown's force formula referring to W. F. Brown, who gave a mainly physically motivated discussion of it. This formula contains a surface integral which shows a nonlinear dependence on the normal. Brown assumes the existence of an additional term in the surface force which cancels the nonlinearity to allow an application of Cauchy's theorem in continuum mechanics to a magnetoelastic material. The proof of Brown's formula which is given in this work involves a suitable regularization of a hypersingular kernel and uses singular integral methods. Secondly, we consider a discrete, periodic setting of magnetic dipoles and formulate the force between a part of a bounded set and its surrounding. In order to pass to the continuum limit we start from the usual force formula for interacting magnetic dipoles. It turns out that the limit of the discrete force is different from Brown's force formula. One obtains an additional nonlinear surface term which allows one to regard Brown's assumption on the surface force as a consequence of the atomistic approach. Due to short range effects one obtains moreover an additional linear surface term in the continuum limit of the discrete force. This term contains a certain lattice sum which depends on a hypersingular kernel and the underlying lattice structure
Schnelle Ăhnlichkeitssuche in XML-Daten
Title Page and Contents
1. Introduction 1
2. State of the Art 9
3. The approXQL Query Language 31
4. Modeling Documents and Queries 41
5. Querying by Approximate Tree Embedding 51
6. Direct Query Evaluation 67
7. Schema-Driven Query Evaluation 109
8. Efficient Algorithms for Plan Operators 127
9. The approXQL Query Engine 145
10. Experimental Efficiency Analysis 153
11. Conclusion 175
Bibliography 183
A The Grammar of approXQL 199
B Symbols used in the Thesis 201
C Anlagen gemÀà Promotionsordnung 205The eXtensible Markup Language (XML) is a widely accepted standard for the
representation of data. The more data is stored in XML documents, the more
important become methods for effective and efficient searching. An important
characteristics of XML documents is their self-describing structure. Queries
that specify selection conditions for the structure promise to greatly improve
the precision of the search. However, the use of the structure can also be
problematic, because it is hard for users to learn all of the details of the
often complex and heterogeneous structure required to phrase a query, and
because structural selection conditions often lead to overspecified queries
that miss relevant results.
In this thesis, we propose an innovative method for searching in XML data,
which uses the descriptive structure as a guide to locate the requested
information. A user needs only partial knowledge of the structure to formulate
queries that specify conditions on both the content and structure of
documents. A query is interpreted in such a way that it retrieves not only
exact matches, but also results considered to be similar to the query. To find
the similar results, sequences of transformations are applied to the query so
that its structure is adapted to the structure of each document in the
collection. Each transformation within a sequence has a cost; the total cost
of a sequence measures the similarity between the original query and a
document matched by the transformed query. This total cost is assigned to the
document and determines its position in the list of results, which is sorted
by decreasing similarity. By adjusting the costs, the interpretation of
queries can be tailored to the needs of different users, and also to the
varied characteristics of XML documents.
We present all necessary algorithms and data structures to implement a query
processor that answers a query in polynomial - typically sublinear - time with
respect to the size of the database. For a given query, the query processor
creates a compact query-execution plan that represents all possible query
transformations. It evaluates the plan by executing operators that
successively calculate the transformation costs for each document in the
collection. We present techniques to effectively optimize the evaluation of
query-execution plans by exploiting equivalences between operators. To reduce
the query-evaluation times even more, we propose a method to retrieve the best
n results, without computing similarity scores for all documents in the
collection. This method uses a structural summary of the data to estimate the
best k transformed queries, which are successively evaluated until the best n
results are found. The theoretical concepts are validated by a prototypical
implementation. We describe the architecture of the prototype, and discuss the
results of systematic tests carried out to analyze the evaluation times for a
representative set of queries with respect to various collections of real and
synthetic XML documents.Die Markup-Sprache XML ist ein weithin akzeptierter Standard zur Darstellung
von Daten. Je mehr Daten in Form von XML-Dokumenten vorliegen, desto wichtiger
werden effektive und effiziente Verfahren zur Informationssuche. Ein fĂŒr die
Suche sehr wichtiges Merkmal von XML-Dokumenten ist deren selbstbeschreibende
Struktur. Durch Ausnutzung dieser Struktur können sehr prÀzise Suchanfragen
formuliert werden. Doch leider kann die Dokumentstruktur sehr komplex und
heterogen sein. Dem Nutzer sind oftmals nicht alle strukturellen Details
bekannt, so dass die Formulierung von treffenden Anfragen schwierig ist und
nicht selten relevante, aber nicht exakt zur Anfrage passende Dokumente
verfehlt werden.
In dieser Abeit stellen wir einen innovativen Ansatz zur Suche in XML-Daten
vor, der die selbstbeschreibende Struktur als Leitfaden zum Auffinden der
gewĂŒnschten Information verwendet. Ein Nutzer benötigt lediglich partielle
Kenntnisse ĂŒber die Struktur, um Anfragen zu formulieren, die
Selektionsbedingungen bezĂŒglich Inhalt und Struktur der Dokumente
spezifizieren. Eine Anfrage wird derart interpretiert, dass nicht nur exakt
passende Ergebnisse gefunden werden, sondern auch solche, die als Àhnlich zur
Anfrage eingeschÀtzt werden. Um diese Àhnlichen Ergebnisse zu finden, wird die
Anfrage mit Hilfe von Transformationssequenzen an die Struktur eines jeden
Dokuments in der Kollektion angepasst. Dabei hat jede Einzeltransformation
bestimmte Kosten. Die Gesamtkosten einer Sequenz werden zur Bewertung der
Ăhnlichkeit zwischen der originalen Anfrage und den von der transformierten
Anfrage selektierten Dokumenten verwendet. Diese Kosten werden den Dokumenten
zugewiesen und bestimmen deren Positionen in der Ergebnisliste, die nach
fallender Ăhnlichkeit sortiert ist.
Wir stellen alle notwendigen Algorithmen und Datenstrukturen fĂŒr die
Realisierung eines Anfrage-Prozessors vor, der Suchanfragen in polynomieller -
typischerweise sogar sublinearer - Zeit bezĂŒglich der KollektionsgröĂe
beantwortet. Der Anfrage-Prozessor wertet AusfĂŒhrungsplĂ€ne bestehend aus
Operatoren aus. Die Operatoren berechnen sukzessive die Transformationskosten.
Wir beschreiben Techniken zur effektiven Optimierung von AusfĂŒhrungsplĂ€nen,
die Ăquivalenzen zwischen den Operatoren ausnutzen. Um die AusfĂŒhrungszeit fĂŒr
eine Anfrage weiter zu verringern, schlagen wir eine Methode zum effizienten
Auffinden der besten n Ergebnisse vor, die eine strukturelle Zusammenfassung
der Dokumente in der Kollektion verwendet, um die besten k transformierten
Anfragen zu bestimmen. Diese Anfragen werden nacheinander ausgewertet, bis die
besten n Ergebnisse gefunden sind.
Eine prototypische Implementierung bestÀtigt die Effizienz der entwickelten
Methoden zur Anfrage-Auswertung. Wir beschreiben die Architektur des Prototyps
und diskutieren die Ergebnisse von systematischen Tests zur Analyse von
Anfrage-Auswertungszeiten fĂŒr verschiedene Kollektionen mit realen und
synthetischen XML-Dokumenten
Dynamische Datenbankorganisation fĂŒr multimediale Informationssysteme
The topic of this thesis is a mathematically rigorous derivation of formulae for the magnetic force which is exerted on a part of a bounded magnetized body by its surrounding. Firstly, the magnetic force is considered within a continuous system based on macroscopic magnetostatics. The force formula in this setting is called Brown's force formula referring to W. F. Brown, who gave a mainly physically motivated discussion of it. This formula contains a surface integral which shows a nonlinear dependence on the normal. Brown assumes the existence of an additional term in the surface force which cancels the nonlinearity to allow an application of Cauchy's theorem in continuum mechanics to a magnetoelastic material. The proof of Brown's formula which is given in this work involves a suitable regularization of a hypersingular kernel and uses singular integral methods. Secondly, we consider a discrete, periodic setting of magnetic dipoles and formulate the force between a part of a bounded set and its surrounding. In order to pass to the continuum limit we start from the usual force formula for interacting magnetic dipoles. It turns out that the limit of the discrete force is different from Brown's force formula. One obtains an additional nonlinear surface term which allows one to regard Brown's assumption on the surface force as a consequence of the atomistic approach. Due to short range effects one obtains moreover an additional linear surface term in the continuum limit of the discrete force. This term contains a certain lattice sum which depends on a hypersingular kernel and the underlying lattice structure
Schema-Driven Evaluation of Approximate Tree-Pattern Queries
We present a simple query language for XML, which supports hierarchical, Boolean-connected query patterns. The interpretation of a query is founded on cost-based query transformations: The total cost of a sequence of transformations measures the similarity between the query and the data and is used to rank the results. We introduce two polynomial-time algorithms that e#ciently find the best n answers to the query: The first algorithm finds all approximate results, sorts them by increasing cost, and prunes the result list after the nth entry
ApproXQL: Design and Implementation of an Approximate Pattern Matching Language for XML
We introduce the simple query language approXQL, which supports hierarchical, Booleanconnected query patterns. The interpretation of approXQL queries is founded on cost-based query transformations: The total cost of a sequence of transformations measures the similarity between a query and the data and is used to rank the results. We describe in detail the implementation of the approXQL query processor, which uses an expanded query representation and sophisticated indexes to compute all results of a query in polynomial -- typically sublinear -- time with respect to the database size
Dynamische Datenbankorganisation fĂŒr multimediale Informationssysteme
The topic of this thesis is a mathematically rigorous derivation of formulae for the magnetic force which is exerted on a part of a bounded magnetized body by its surrounding. Firstly, the magnetic force is considered within a continuous system based on macroscopic magnetostatics. The force formula in this setting is called Brown's force formula referring to W. F. Brown, who gave a mainly physically motivated discussion of it. This formula contains a surface integral which shows a nonlinear dependence on the normal. Brown assumes the existence of an additional term in the surface force which cancels the nonlinearity to allow an application of Cauchy's theorem in continuum mechanics to a magnetoelastic material. The proof of Brown's formula which is given in this work involves a suitable regularization of a hypersingular kernel and uses singular integral methods. Secondly, we consider a discrete, periodic setting of magnetic dipoles and formulate the force between a part of a bounded set and its surrounding. In order to pass to the continuum limit we start from the usual force formula for interacting magnetic dipoles. It turns out that the limit of the discrete force is different from Brown's force formula. One obtains an additional nonlinear surface term which allows one to regard Brown's assumption on the surface force as a consequence of the atomistic approach. Due to short range effects one obtains moreover an additional linear surface term in the continuum limit of the discrete force. This term contains a certain lattice sum which depends on a hypersingular kernel and the underlying lattice structure